Nattoku in Vector space

https://gyazo.com/e5b07f1678a2b691aa89e3f8aa4baf74

In the lightning talk, I didn't delve into mathematical discussions due to limited time. For a more detailed explanation, I created diagrams by embedding the meanings of words into vector spaces using LLM.

https://pbs.twimg.com/media/GDC8BeGaEAAehIG?format=jpg&name=medium#.png

Lightning talk: Words as Public Goods

nishio This is a two-dimensional visualization of the meanings of each word embedded in a high-dimensional space using OpenAI's text embedding API. In simple terms, it demonstrates how AI recognizes the similarity in meanings of words like this.

Embedding API

nishio Plotting two languages on a single chart is not a straightforward task. In this chart, the first principal component axis of PCA is treated as the axis representing the differences between languages and has been removed.

nishio Here is annotated version. The plotted words are a combination of those I have considered and those that GPT-4 has suggested as being similar. So it shows GPT4 can not find English words similar to Japanese Nattoku. Understanding and agreement is major explanation in dictionaries

https://pbs.twimg.com/media/GDDAKtyakAAbT68?format=jpg&name=medium#.png

nishio One word can bridge multiple concepts. In this example, the Japanese word "納得" (nattoku) serves as a bridge connecting concepts like "understanding", "agreement", and "satisfaction". Similarly, in Mandarin, "數位" (shùwèi) connects concepts like "digital" and "plural".

nishio In the mapping from a high-dimensional space(H) to a low-dimensional space(L), objects that are close in H will generally remain close in L. However, there is no guarantee that objects far apart in H will also be far apart in L.

nishio You can think of it like imagining the shadow of a three-dimensional object. Therefore, the absence of proximity in a low-dimensional space can be useful for understanding a high-dimensional space.

Making

https://gyazo.com/f3290610ea13f77d41522219f97f553f

Simple PCA generated this

it shows thr difference of languages

https://gyazo.com/5e13f13280ccb8fc656d85026fd95008https://gyazo.com/4505b5a3297c33f476a6969beabc02a7

Visualization on each languages

Those are good for observing each language only, but those should not be overlapped, by the nature of PCA.

In the observation I found some candidate words are far from other words. Those outlier are ommitted from the visualization. We have only two~three dimensions to express information.

https://gyazo.com/f0a155a556427ce3e97fb147094a576a

In this chart, the first principal component axis of PCA is treated as the axis representing the differences between languages and has been removed.

Finnaly got this

https://gyazo.com/f546d4ec99b272609d99dc4558dbd215